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'7^^^ ± U MM^iJItm^i HITACHI SR8000 t::^nt 5 
FFT T ;u U 

ift^*'* 



r;i'=rvX-£.>Sr^3fe-r^. four-step FFT T;W=/^VXAJi«:rtfi9;^-yS€ri&<^*7t*!)tC. five- 
step FPT T;i>=f';XAict!r:gii-^:ii:7&^-c#^. ^tc fft r^^r^'UXi^o^^uisgiLrfi. 

four-step FFT five-step FFT r)V::f^) XA ^rfflU^fc. ^^jfea5Ri"^iife^J— -^TC FFT 7 

{^m ^ ^ h y - K '^jimL/cS'fi^ ^ 'J mmmnm hitachi srsooo fcim l. 

i:^^fz. -frco^&M. 16 y - KcO SRSOOOTJi*^ 38 GFLOPS (OttiB^lf-S 



Parallel FFT Algorithms for the Distributed-Memory Parallel 
Computer HITACHI SRSOOO 

DAISUKE TAKAHASHIt.<r 

In this paper, we propose high-performance parallel one-dimensional fast Fourier transform 
(FFT) algorithms for distributed-memory parallel computers with vector symmetric multi- 
processor (SMP) nodes. The four-step FFT algorithm can be altered into a five-step FFT 
algorithm to expand the innermost loop length. We use the four-step and five-step FFT algo- 
rithms to implement the parallel one-dimensional FFT algorithms. In our proposed parallel 
one-dimensional FFT algorithms, since we use cyclic distribution, all-to-all communication 
takes place only once. Moreover, the input data and output data are both in natural order. 
Performance results of one-dimensional FFTs on a distributed-memory parallel computer with 
(pseudo) vector SMP nodes, HITACHI SRSOOO, are reported. We succeeded in obtaining per- 
formance of about 38 GPLOPS on a 16-node SRSOOO. 



1. t± U «) tC 

iSiE Fourier (fast Fourier transform, JAT 

SMP (Symmetric Multi-Processor) tSJ5fc<7) 



Information Technology Center, University of Tokyo 
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Xij =7 - >^ D-fe -j^\z}^:h SMP ^i^-^fco 
*^ #y-K**-^^^ h;W'/D-tr7-<ftCj:-& SMP»«: 

'J m.mm%m^^^^xm\\-'4ot fft 

^ ^ h ;v SMP y - K S:^® L /;:«^tS:> V fflite?y 
fi, -^iJ^ h;PSMP /"Krt{z:fev^r^v>'ttffi*-|f ^ 
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fete, y-Krttci5tt*^^MS-5ilRf-5:it*«fil? 

four-step FFT T^Vd^VXix »^ ICiJV^T, 

five-step FFT four-step FFT X 1/ five- 

step FFT T;l'rf VXixfrC^O*^ -ftS03fe?IJ-^7C 

FFT 7 ;u rrv Xi^ ^itfife-e ^ ^ c t ^^t- 

# < <50iife?iJ— >sfe7C FFT 7 'J X A Jo V ^ r ti, 

?*-c At? h m ^ i; T' - ^ see ^ £'i?35**> 

• ^«'^a^<?:>@lSS:*ifte>i-f.:*!)(C, Edelman e>li 
fast multipole method^^^'*^^ '^fflVvTffi^.6?J^CT'~- 

^T^ZtKX^, Edelman h <0^^<0 X n ^cffiSi'SriOf 

Ztlh (D^n->k7C FFT 7 Jl^ V X A 4" (Mm) 

HITACHI SR8000 tC^^SU, 14S6fPffi'^tT ^ . 
jy.T, 2 HITACHI SR8000tCOV^Tffl#lCl{i 
?g-f^. 3 ^TT four-step FFT 7 ;^::J'VXAtcov^T, 

4 *T*ffiag1-^ five-step FFT 7 ;Wrf XA tC.OV^T, 

5 ^-^a&^nj^^TU FFT 7;i^:i'VXAtCOV^r®-^^. 

6 ^1^^^ h )V SMP:^-" Kp*iltci5tt^ FFT 7;l/r/ 

V Xa tcov^T^><^. 7 *t*^f&:3:T*^-r^^(l— ^ 
TCFFT 7;i'rri;XAOtttgfFffi^:^Sr^'t. 

2, HITACHI SR8000 

HITACHI SR8000 ( M^ft) h ;V SMP ^ - 
K«r^«Lfc^tSt^^USi!ilfc?!jft»®T*^. SR8000 
{i4^@-128^1(7>y-K:d><b«^^:ft^. #y-K(± 
)^y^mmm^''^ *>6ix./c8lS(7)RiSC7'n 
-l:';/-9-:d-t>^oTi5»). Sffetf- fttfeJi 8 GFLOPS. 

ft:*:>-=eg#s*i8GB-rab-5). SR80oo<o;S-y-K(i 

K P^a^a^igjRti 1 GB/S? (M**f^) , 2 GB/# (2Sl 



3. Four-Step FFT T XU 

FFT fi, fitS: Fourier (discrete Fourier trans- 
form, iUTDFT) 4"iSi$^c|t»i--2>7;urj*gXAi: 
LT4a^)ixTV>-2,. DFT«±^^-e5ea$tL-5. 

n-l 

yfc = ^aj,a;i\ 0<fc<n^l (1) 

n = ni xn2 iliS^Wt' # * t <!0 fci"* t , ^(1) 

j = ii -h i2ni, = A:2 + A;in2 (2) 
tm< C t?&^-e|t^. -eoi: ^ (1) Ox i: y fi 
^OJ: ^ icII^TcBSyiJ (colixmnwise) "C^t* C i: ^i^-T 

xj = j2), 0 < ji < m - 1, 

0 < 72 < - 1 (3) 
= V(A:2, 0 < fci < ni - 1, 

0 < fc2 < n2 - 1 (4) 
L/^;0-*-:>T, ^ (1) (5) <0 i d »C'3S®"e§ ^. 

y(fc2, All) 

nx— 1 nj— 1 

= E £ *0l. J8X*'<n',c4\*' (5) 

^ (5) J: ^<fc, four-step FFT 7 

Step 1; ni M<7) na multirow FFT 

y2=o 

Step 2: t^ite >5 

a:20*i» *?2) = ifc2)t^i',n'a 

Step 3: 

a;3(fc2, ii) = X2{ji,,k2) 
Step 4: n2 ^a.<0 ni multirow FFT 

Step 2 frC*5*t^ u4^X V^tx^^ik i:0?tf 
^ CO four-step FFT 7;pmj X A O^giSrtJLTtw 

• m = na = ^ Lfzi^"^, ^/n^(D y/n mul- 
tirow FFT^*^ Ti^^Step 1 t 4 t»^ffctlv?>. \^fz 
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4. Five-Step FFT TJUzSU X A 
3 $TJE-</iii£3feO four-step FFT T;l'=f»jXA 

four-step FFT T X ib* ^ ZL^TC^iS * H 

n;6^n = ni x n2 x na fc ^fl?'C^ § ^ fc i" ^ fc , 
^(1) frCi^tt^ j iSXXfk ti, 

y = ii + J2ni + J3nxn2 y 
fc = fca -f A;2n3 + A;i nans 

^tOJ: 9'S:H^>C7Cffi^J (columnwise) T^'TZti^^'V' 

i»j = ^iii » 72, is), . 0 < ii < ni - 1, 

0 < ;2 < n2 - 1, 0 < j3 < na - 1 (7) 
Vk = y(fc3, A:2, A:i), 0 < fci < ni 1, 
( . 0 < A:2 < n2 ~ 1, 0 < fcs < na - 1 (8) 

^ L^^c^o^or. 5S:(i) {i^(9)<oj:9*;:S:5g-c§*. 

Til — 1 — 1 n.3 — I 

y(A3, fc2, = X) £ J] ^0'i> i2, i3)t^if ^ 

^ (9) 5&^io>!fe*C?rc;^*X^ J; 9;Sc, five-step FFT T 
Step 1: nin2 na multirow FFT 

i3=o 

Step 2: C>^a «ftSrCO*»:fe J: t^g 

a?2(A:3, ii, i2). = xjO'i, 32, ^3)0^^^^^^ 
Step 3; nsni ffi<7) 712 ,S multirow FFT 

712—1 

^s(k3y ii, A;2) = X a;2(A:3, ji, 

J2=0 

Step 4: C>;^ J) ^M<Omn^ J: C>*»g£g 
Step 5: 7139X2 ffiO m multirow FFT 

ni-l 

y(fc3, fc2. fci) = X X4(A;3, k2, jl)t^^\^^ 
J 1=0 

CI CO five-step FFT T Jl^ zT^J X J^<0!^WLi: 

• nx = 712 = na = n^^' t l^tzm^^, n^^^ 
n^/^ ^ multirow FFT Step 1, 3 5 Tff ^ 

Se^^^T) four-step FFT T^UrTyXAJllg^^h;!/ 



5. M^U-^TcFFTT^^m'UXA 

5.1 Four-Step FFT {Cg-S < M?IJ— 55c5c F^T 

363^0— ^7C FFT 7 »; X A t UTfi, four-step 
FFT CO^^T** six-step FFT®^ tc»cf < ^^ij,-^ 
7CFFT TJUrrvXA ;4^&<baTV^^>0^ 

^ :i "e:*:|B -CiiiifeyiJ-^Jcx fft t ^i^ rr u x a ^# 

7L^\Zi>tl^X four-step FFT C0#x:^*aSffl-f . 

--^^tB FFT ^Zi^^^xr-'^WLN 75* TV = Ni X 7V2 
i:^»$tt>& tr^. -r^t. -^-kitm^l x{N) \t 

x(Nu N2) ri-^%g (TVi) t::f&oT5^M$tt^, Ni 

g CO y - K tc:fett ^ti-ij >r^y ^ j,{m) 

Jr = Jr(m) X P + m, 0 < m < P — 1, 

1 < r < 2 (10) 

(ozL^ytmmiiz^m^ti^. 4*5, Pii±ptmc^ 

1*0^7"- ^ i xiNi, N2) tr^t, four-step FFT 

tz^cf < ^yij— FFT r )i^=fV Xa fiv^t^o J: 9 

step 1: Ni/P ^<r> N2 multirow FFT 

N2-1 

J2-O 

Step 2: C>ia »} ^?5:<0*»;fo J: ^ - K * 

:^x\(JuK2)ivi\''^\ 

Step 3: y - K 

X3(i?2, il, P2) = X2(P2, K2, Jl) 

Step 4: ^it^SM 
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Step 5: / - K |*|ffEfi 

= X4{K2, Jl, Pi) 

step. 6: N2/P ffi<0 Ni ,S multirow FFT 

iVi-l 
Jl=0 

±HS<^3i?!)— >i^7CFFT T;vrf 1; XA t'i^V^T, Step 
1 h 6 -e multirow FFT ^^%^7$it^. Step 2 "CO 

^. Step 3 t 5 T{±, '^M^MiMi: 1 EITIg 

fir9. Step 4 r-ii^**^ii^;{i^fi^to:fL^. 

fouT-step FFT tcfto' < 3£?lJ— ^th FFT T V 

;^ mxdtirow FFT Step 1 i: 6 "ClItT ^ ^ '6 . 

5.2 Five-step FFT tCft-cJ < ^7C FFT 

5.1 gS-CiE-^/i four-step FFT < ^TC 

FFT r;u::i^VXATI±, ^ ^i: N, y 
PtLfzm^. «rtWJ;i--'/S(i v77/p-C*o;t. 

4 *-ei^-</c five-step FFT ^^?!|-^5c FFT T;l/ 

— -^TUFFT twisv^T-r- il)^N = N1XN2X 

ti-^t, five-step FFT t::;^-::^ < ^7C FFT T 

Step 1: (7Vi/P) . N2 m<0 N3 .^.multirow FFT 

N3-1 

M^U ^2, K3) = J2 ^<^^> ^^2' ^^)^Nf" 

Step 2: ^^S5ccO«»:fe i t>V - K rtlsB 

Step 3: ATs • (VN^i/P) //a ."S multiroW FFT 

^3=0 

Step 4: *) mfk'^mn^ i J>V - K rtWBEg 

X4(P3,K3, ^"2, Jl) = X\{K3, Ki, J\) 

Step 5: y - K l*ll£e 



x5(/f3. ATz, Jl, P3) = X4(P3, Ka, Kt, A) 
Step 6: ^H-kmm 

K2, Jl, Pi) = a:s(i?3, K2, Jl, Pa) 
Step 7: y - K I^SSett 

X7(iif3, ifj, Jl) = X7(K3, K2, Pi, Jl) 

xeiKs, K2, Jl, Pi) 
Step 8: {N3/P) ■ N2 U.<0 Ni ^ multirow FFT 

ATl-l 

y(/f3, K2, Ki) = X7(K3, K2, Ji)<^i,f' 
ji=o 

±E03&?iJ-^7EFFT r ;^r^'J Xa tcisv^T, Step 
h 3 t 8 T multirow FFT f)^^ff$ti:h. Step 2 t 

^^Cfgffl•r.>&wi:7&^'eS:£». Step5t7T{i. 

>'-Krt»ffiS«:^f -9. Step6"C(i^*ti^aft**^ffc 
five-step FFT tC^^ < #?[|— ^fejc FFT T ^Uri^'U 

• Ni = N2^ N3= ^ L /c^-^. AT^/VP m 
60 AT^/^ multirow FFT T&^Step 1, 3 8 

• four-step FFT ^C^o' < afc^'J^-JfeTC FFT T }\^::f 

5.3 ®rtlBiJ-^u-:?^fi(;):Blitt 

four-step FFT J: t;^ five-step FFT tc^^ < 
— <#C7CFFT T;^rfUXAt::ii5U>T, #y-KF^tci3tt 
^M\^m)U-yg:i^n^;i,, four-step FFT tw^O- 
< ilfe^y— ^7C FFT T ;w i; X A -CJi, jv := iVi X A^a 
•C^^a^^-g^tc Ni=N2 = y/N t L/y-KS:<r 

— five-step FFT ^CS-:^ < 3^59—^76 FFT T 
;i.rr»JXA'C(i, N ^ X N2 X N3 vm^ti^m 
>^\,Z AAi = A^2 = ATa = AT^/^ ai*IffllJ;U-r 

-rmtN^^^/p 

^Jxif. AT = 2^^ i: LT P = 128 t Lt^^*'^, S 
l^fiy^^-XSti four-step FFT imM—'^lt 
FFT T;!/!^; XATii 32 (= v^/128) 'r^i>hmz 
*t L, five-step FFT tCfto* < 3i£^J— ^^j^tC FFT 7 
=rt;XAT-fi512 (= (2^^)2/3/128) ttj:h. 

:ifihm^m-h, m\^m)\^-y^\zmLx\t, five- 
step FFT tC^oX ^7C FFT T;V/mj XAtJj* 
four-step FFT ^Cfto* < dfe^ij— FFT T)Vzf^) X 



n 1 f^J^mny-^K^ifi^^ 2, 4, 8 (D DIF Stockham T =i'^) X J^Ki^^ < FFT 
iJ'^^)\^<n%mnm^ (HITACHI SR8000) 





^^2 


^1^4 






8 


16 


32 




4 


12 


32 




6 


22 


66 


^^ri^m^^^'^Mnm^ (niog^ n) 


5.000 


4.250 


4.083 




8 


28 


84 






1.750 




n - K 4- 7.h r\§\^(Dlt 


1.000 


2.625 



6. h;i/SMP y--Kp*gtz*5it"5 FFTT' 

T;i-rf'JXA7i)«^<J|3g$tLTV>^ 2).9),is) 

ii^SMP -^-Kpg^ciJtt^ fftt;v3*>jxa t Lr 

ii Stockham T;l^rftJ X A i6),i7) tcS'Cf< multirow 
FFT T^Un^VXA tfflV^/j. Stockham T)\^:fVX 
A (i, Cooley-Tukey T lU^^ V X A 60 J: 9 tcAt? i: 

^'jTj^^dS:^^ ^ ^9 > -e tJ^Jg* Cobley-Tukey <OT)U=r ij 
XA(7)2fSco^^g§l::;i^ifi:>Stc?i^. L:*-U 

S/vi, multirow FFT r;U::^'V XATJi, IS 

2 FFT -Cii, 2 O FFT Kit^xmU 

FFT^«> trJgffli-^^ t i ?5!l^*^^<1-^c:i: 

SR8000 60^ y - K I^COy n -fe a = ±a ± 6c 

-C^ S J: 9 ^ 3 > K 69«*Dr*^^^S:^o 

'5>. Goedecker «±^^ 2, 3, 4. 5 FFT — 

tc i5 V ^ T iSa^^ H * S/Jx frc f ^ r ;^ >; X 
FFT *-^>;utci3i/>Tig«<;oT;i'rr»jXAS:ii3gL 

TV>^ 20) 

i:.;^^ ^f\^h<nm^m.ni^'^^z^\^>tz fft 

d = ±adb6c X^^iX^ J: ^ ^ 4 >K 
iT^a^D*^^-^ y D (±^?Sl -e -i) 
3 ^ ^ ^ > K <7)^faj$»^^ ^ i^o 7" n ^ -cJiSft 

(decimation-in-frequency, iilTDIF) O Stockham 
T;W3*U Xa im^^^tz. 

^l\±m,nm)\^-'f\Z^I,f^^^2. 4. 8 60 DIF 
Stockham T ;^ rii' U X A tc^-::? < FFT 7(7 - ^ Jl/co J| 



FFT :*-*^;n^tcfev^T, mmmn^^^mmx^ 

^1 7>^?>:»-7i/-^.J: o tc, j^v^»85:(OFFT ?57-:^;i/ 

^^WitU-Y 7.Yrmm.(Oit\t, SI&8 60FFT 
fii£SSc2 C7)FFT tCJt-<T 2.625f&"C*>), 4 O 

FFT tCjt-<Tfe 1.5<&t^foTV>;&. 

2,^FFT S:Bfe< 2-^i(^)FFTT-{±, *SS:4 t^Jft 
8 <r>m.^'ktf'^t^Z J: 15 FFT «-tm U, ^Sf[ 2 <0 FFT 

i}-^}vmm-r^:^t\ziy), u-Y tT^VT^m, 

ZtiH^X^^. Mc^mz\t, n = 2P (p > 2) ,6; 
FFT «r n = 4^8" (0 < g < 2, r > 0).t LTtt»-t" 

^wtfrcif?. t^SfcsoFFT 

r n > 4 (0^^kZ 2 -^i CO FFT Srff^-t ^ t i)^X 

:z<r>Xn\Z, -^:5r h;wSMP y-rKP3frC:fcfrt^ fft 
T/WzfVXATJi, V— KF«360p<^»jT^-feX|5IScS: 

4 i5 ct Xfmik 8 CO DIF Stockham T :^ V X 
A IrC^-r? < 715 n multirow FFT i^iX^tl 
mi, Sl2tc:^i-. Ml i^XZ/m 2 {ZT^LfzmWi^r 
8 60 multirow FFT Xi±, «rtl|iJ;l'-X'e n5 ffiO n 
FFT :;ii^P]B#^Ci(!LaSS tLTV>^ ^ t 
6.1 ^'-J^hJUSMP y-KF*3tZ^nt$M5U14 
SR^ooo <7)iS-y -K'C(i, FFT :*-;Tv;K7)nffl!l<0 
JP- r y - K CO 8 IScT^T* n >fe -If tc^E ^ tt^ . 
m 1 i5 J: 2 -t^^ L ^tftJS 4, 8 O FFT * - ^ ;l/ 

tZi^b^^, *!!S4, 8 CO FFT t,tc 
±5v^-C4-. do t vV'-ycoeticoKm^*^ t = p o 
^-g-lrCfi. do i ;i/-yco;^-' yg/ T&^i t'SroTL 
*97t«6tc, #y-K«o 8'IB<^7'n^-;;-9-^z;u-':^4. 
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n = 4P, = e-2W9 
do i = 1, p 

complcx*16 Xt^x(ns, m, 4 * 0.^t(ns, 4 ♦ m, 
do j = 1,1 
do fc = 1, m 

do rom = 1, n5 

Co - Xt^i{row, k, j) 
. ci = A't^i(roii;, A:, j + I) 

= Xt-i(row, k, j -^2*1) 
cs = Xt^i(rdw, fc, j -f 3 * 0 
do = Co + C2 
di = CO — C2 
^2 = Ci H- C3 
^3 = -~i(Cl — C3) 

Xt(roti;, fc, j) = do + da 

Xt(roiiJ, fc H- m, j) = ui^f ^(di H- da) 
Xt(roTi;, fc + 2 * m, j) = ti^lP^^^Cdo - di) 
A-tCrotu, fc + 3 *m. i) = wfp'"^^(di - da) 
end do 
end do 
end do 
end do 



m 1 gfeSL 4 (?) DIF Stockham T;i/a*^J < na Jffl<0 n 

xnultirow FPT 

■^^m < ^ . 

L/^Tij^^oT, / < 8 (Dm^^Klt, do i t do fc O 
;W-7'«:AtLl^x., do fc ;U--/S:#y-K0 8<i<?3 

^-^tcti, 'j X ^ h ;v*fflv^T do i ^ do fc (50 

O m j3 i ?>V ~ K m P *^{l:$-^TM:*fS3 AT .^5 FFT 

i5. FFT coit»ti<S«ap:^^3S-e?TV\ =«i!gs:<7) 

(i&m) ^;5^ h ;w SMP K Srffimtrciffli^ 
-=6 »J S!3&5lMK^ i: LT, HITACHI SR8000 (128 
y-h\ m.^'^Omm 1024GB, S^tf-^'fttg 
1024GFLOPS) (Dd -^, 1 / -K ^^16 > - K tffi 

7. 1 HITACHI SR8000 ^ ^ )io:s:«sm 

SR8000.(5r>y-Kraa«^^y^';i:UT, MPI^ 



n = 8^, a;, 
do t = 1, p 



g-27rt/9 

, m = 8*-i 



coinplex*16 Xt-i(ns, m, 8 4< 1), A"t(n5, 8* m, Z) 
do i = 1, f 
do fc = 1, m 
do row = 1, n5 

Co = Xt-i{row, fc, j) 
= Xt^i(row, fc, i + 0 
= Xt-iirow, fc, i -f 2*0 
: JSft-iCrow, fc, i + 3*0 
: Xt-i{row, fc, i + 4 * () 
: Xt^i(row, fc, J -f 5 ♦ Z) 
: Xt-i{row, fc, i + 6 ♦ Z) 
: (row, fc, i + 7*f) 

: CO + C4 
= CO — C4 
: C2 + C6 
: -t(C2 - Ce) 
• Ci + C5 
: Ci - C6 
: C3 + C7 
: C3 - C7 

do + d2 
do — d2 
d4 + de 
-i(d4 - de) 



ci 
C2 
C3 
C4 

cs 

C6 
C7 
do 

di 
d2 
d3 
d4 
ds 
de 
d7 

CO 

ci 
e2 
C3 
£4 

C6 
C6 

cr 

69 



^(ds - d7) 



-^i(d5+d7) 
di + 64 

di -e4 
da + C5 
da - C6 , 
Xi(row^ fc, j) 
Xt(rotD, fc H- m, j) 
Xt(roti;, fc + 2 * m, j) = uij 



= eo + 62 

= <^^|^\e6 +68) 



Xt(row, fc + 3 ♦ m, y) = wj," 
Xt(row, fc + 4 ♦ m, j) = o/gj^' 
-X't(roxu, fc + 5 * m, j) = WoH~ 
(roiu, fc + 6 * m, i) = w|p'" 
Xt(row, fc + 7 * m, j) = 
end do 



\ei +C3) 
H«7 - C9) 
^(eo - 62) 

^(67 + 69) 

\ei - 63) 
^(ee - cb) 



end do' 
end do 
end do 



m 2 SSC 8 CO DIF Stockham TA^^f') XA (caf-:^ < nj fflO n 
,«^. multirow FFT 

m^^tz. 7'n>r ^i.t±-f-<T FORTRAN T-fBiEL 
7t. 3 ^v'nM ^l±H j5!:<0*ffl^b FORTRAN?? VOl- 
OOSrfflV\ ftia^b^t^/vg > t LT-nolimit -rdma 
-WO, »opt(o(ss))* «:tg^L/c*, 

V 3 > >r < ^ J C ^ffi -J. a > ^ .; g , ; t E » t «t V ^ :i- V 3 

>-C*i). «-woi'opt(o(8B))'" ll||JfTaS36*€feia<^^J: 
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m 2 four-step FFT l-^cf < 3£?iJ-^7£ FFT (Dm^ (HITACHI SR8000) 





P 




P 


= 2 


P 


= 4 


P 


= 8 


P 


= 16 


N 


Time 




Time 


GFLOPS 


Time 


GFLOPS 


Time 


GFLOPS 


1 ime 


O'er i^oc 


\y20 




3 398 


0.01969 


5.324 


0.01279 


8,199 
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r\ f\r\ AO O 
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3.424 
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0.02610 
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rt 1 ♦> 1 r\fi 
U.lJlUZ 
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3.543 
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.5.741 
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8.919 
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U. 56249 
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' 33.022 
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five-step FFT tC^Cf < 36^J~-^^C7C FFT (Omt (HITACHI SR8000) 
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four-step FFT iSXl/ five-step FFT K^-zS < N 

^."c, m^^m<o^mtfpTi> n^2^ fft 

* 2 J: r/^ 3 -^^i^^-^-^ X^K, five-step FFT 
tl^o* < Mt^lJ— FFT r ;^r2-U XAH)^, four-step 
FFT tc^o* < mi—^ FFT r)U:fVXA lZit-<X 

Kf*ItC43V^r five-step FFT four-step FFT i,Z 

16 y - K o SR8000 T(i 
five-step FFT tC*o* < FFT T )V zf U X 

A KiSXi^X, N = 60J»^t;^ 38 GFLOPS cOtt 

turn hfixy^^^zt:^^^^^^, 

t fz, m^i-^7c FFT (D^nmmitzii^f^mmm^ 

^Lfz. m4K, SR8000 tCi5tt'5>^*+^a®<7)14tg 
(MB/sec) <i^^*^®«<^a#fi (i'- 1) X {16N/P^) 



S 3 frC J: ^ t , five-step FFT {Z^^ < itfe?y— ^tc 
FFTT)l^zf^)XAXi±, 16 y-KtCiJtt^ N'=2^^ 

FFT (D^ff^^S ti 0.22823 # t =2: o T V> ^4 
X ♦5^**4^fi{ifrC5l*t^S#Wti0.08299#i:*oTj3 

^\ m=f^m<o 1/3 a±*mmnm'^i>^iitii^» 

i i -e, «e*co3feyij-^7c FFT r Xa 
(Dx^\z, xtjt&tiimcr-^nmtiZLfzm^^z 

^mm(omm^ i mitznmr^^m^t^n^mi^m 

8, * t » 

^ ^ V mmmnmizii^'^ -siifeyij-^fcTc fft tji^zt 
nxA^^^Lfz. 

Vt^<7) four-step FFT 7/1/ rfV XA jSV^T, $ h 



-97- 



tcSF*?ffl8;U'--7'SS:g< -t>2)/::i6frC, 2j:i&jtT-«± five- 
step FFT L, four-step FFT i3 J: five-step 
FFT r)U::f')'XAi,Z&-^^-mm<DmmFFT TJUd" 

^m:icx*tk^Lfzm^H'A% FFT T;i-=rT;XAr 

«3»4 iSjlUf^ScS <OFFT SrfflV^^ifc 
tCit^, ;Si5:2 C9FFT.tCJt-^ry-Kl*I(7)p«^'JT 

iit?tJSt#«i HITACHI SR8000 tC||3EgL, t^tlfPffi 
■?-<7)J§^, 16 y-KO SR8000 Tti^^j 
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ft;?*L^Ja>g*tzc?v^Tta&*(7) FFT T i; <t <9 
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FFT algorithm) V }V SMP y - K ft 

wm^ (A) 10780166) (r>%mi^nfz. 
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